Skip to content

fix(pr-management-triage): four classifier heuristic fixes from a live session#344

Merged
potiuk merged 4 commits into
apache:mainfrom
potiuk:pr-management-triage/heuristic-fixes
May 27, 2026
Merged

fix(pr-management-triage): four classifier heuristic fixes from a live session#344
potiuk merged 4 commits into
apache:mainfrom
potiuk:pr-management-triage/heuristic-fixes

Conversation

@potiuk

@potiuk potiuk commented May 27, 2026

Copy link
Copy Markdown
Member

Summary

Four small, independent classifier fixes calibrated from a real triage session on apache/airflow (2026-05-27). Each was a case where the rule fired technically correctly but the maintainer manually overrode the proposed action because the rule didn't match real-world signal. Each commit is independently revertible.

Changes

Commit Change Why
9afd558 Match Copilot bots without requiring [bot] suffix in login GitHub's GraphQL Actor.login returns copilot-pull-request-reviewer without the [bot] suffix; strict suffix matching excluded real Copilot threads from row 2 (stale_copilot_review). Saw a PR with 55-day-old unresolved Copilot threads classify as comment instead of draft.
91b5ead Insert row 12b: mixed static-check + non-static failures → comment, not rerun Row 12 fired only when ALL failed checks were static; row 13 then proposed rerun even when one of two failures was a static check. Rerunning a static-check failure is wasted — needs a code fix. Saw a PR with Static checks: FAILURE + Compat 3.0.6: FAILURE (one static + one possibly-flaky) classify as rerun.
878ab1f Widen F5b to walk the last 5 collaborator comments, not just the latest F5b only inspected the most recent collab comment for unanswered @-mentions. Missed the case where maintainer A pings B in comment N, viewer posts a quality comment in N+1 (orthogonal), and B's reply is still pending. Saw on a PR where vincbeck pinged bbovenzi 2d ago, viewer drafted yesterday for quality issues, classifier proposed request-author-confirmation today — talking over the older ping.
b284aff Add row 0 first_time_stale_abandonedskip Row 1 (approve-workflow) fired for first-time-contributor PRs with no push for 2+ months since prior triage. Re-approving CI on stalled code re-fails on the same unaddressed quality issues. New row 0 (FIRST_TIMER + viewer triage marker + no commits since marker + ≥30d) routes to skip; stale-sweep retires it.

Origin

Came out of the same triage session on apache/airflow (2026-05-27) that produced PR #343 (session-history gist persistence). The session's gist captured each manual override and the four patterns above stood out as systematic classifier mis-calibrations rather than per-PR judgment calls. PR #343 builds the infrastructure to make this kind of cross-session signal repeatable.

Test plan

  • Copilot match: PR with unresolved review thread by copilot-pull-request-reviewer (no [bot] suffix) ≥ 7d old → routes to stale_copilot_review
  • Row 12b: PR with mixed static + non-static failures → routes to comment
  • F5b deep scan: PR with maintainer A→B @-mention 5 comments ago, viewer comment in between → routes to F5b skip
  • Row 0: FIRST_TIMER with viewer triage marker, no commits since, ≥30d → routes to skip, NOT approve-workflow

🤖 Generated with Claude Code (Opus 4.7)

potiuk added 4 commits May 27, 2026 22:41
…]` suffix

The `copilot_review_stale` precondition (row 2 of the decision table)
listed `copilot-pull-request-reviewer[bot]`, `copilot[bot]`, etc. as
the matchable logins. GitHub's GraphQL `Actor.login` field returns
some of these without the `[bot]` suffix — most visibly for
`copilot-pull-request-reviewer` itself, which appears as
`copilot-pull-request-reviewer` (no suffix) on the PR's `author.login`
even though the underlying account is the same bot.

Strict `[bot]`-suffix matching excludes those threads from row 2.
Real-world impact seen during triage on `apache/airflow`: a PR with
two stale (55-day-old) Copilot review threads classified as
`deterministic_flag → comment` (only static-check failure surfaced)
instead of `stale_copilot_review → draft`. The maintainer had to
manually re-route the action.

Relax the matcher to "case-insensitive substring `copilot` in login;
`[bot]` suffix optional". The 7-day age threshold and the
no-author-reply qualifier are unchanged — only the login-string
match is widened to align with what GraphQL actually returns.
…ent`, not `rerun`

Insert decision row 12b between rows 12 and 13. The original ordering
left a hole: row 12 fires only when *every* failed check is a static
check; if even one non-static failure was mixed in, row 12 missed and
row 13 (`failed_count <= 2 -> rerun`) caught the case and proposed a
rerun. Rerunning is a waste when at least one of the failures is a
static check that needs a code fix to pass.

Real-world impact seen during triage on `apache/airflow`: a PR with
`CI image checks / Static checks: FAILURE` plus
`provider distributions tests / Compat 3.0.6: FAILURE` (one static
+ one possibly-flaky compat test, total = 2 failures) classified as
`deterministic_flag -> rerun`. The maintainer overrode to `comment`
because the static-check failure will deterministically re-fail
without a code change.

Row 12b is intentionally placed *after* row 12 so the all-static
case still takes the more specific row (same action, different
reason template clarifies which case the maintainer is seeing).
…comments

F5b previously inspected only the *most recent* collaborator comment
for an unanswered `@`-mention to another maintainer. That misses the
common case where:

  1. Maintainer A pings maintainer B for an opinion ("@b what do
     you think of this approach?") in comment #N
  2. Maintainer C (often the viewer) posts a quality-criteria
     comment in comment #N+1 because of an orthogonal issue (CI
     red, merge conflicts, etc.)

The A->B ping is still outstanding — B has not replied — but F5b
only looks at comment #N+1, doesn't see the older mention, and the
PR slides into the decision table. The maintainer ends up auto-
proposing actions on a PR where two maintainers are mid-conversation.

Real-world impact seen during triage on `apache/airflow`: a PR
where one maintainer asked another for an opinion 2 days ago, then
the viewer drafted the PR yesterday for quality issues. The
classifier proposed `request-author-confirmation` today — talking
over the older unanswered ping.

Widen F5b to walk the last 5 collaborator comments, newest-first,
and fire on the first one matching the unanswered-`@`-mention
condition. The 5-comment window matches the bounded depth the
batch query already fetches; no additional GraphQL cost.
…imer PRs

Decision row 1 fires `approve-workflow` for any first-time-contributor
PR with no real CI run, regardless of how long ago the maintainer
already gave the contributor feedback or how long the contributor has
been silent. The result: a PR that was triaged 8+ weeks ago, with no
new commits since, keeps surfacing every triage sweep and the
maintainer either has to re-approve CI on stalled code (which then
re-fails on the same unaddressed quality issues) or override to `skip`
manually.

Add row 0 + the `first_time_stale_abandoned` precondition. The row
fires *before* row 1 and routes to `skip` when all of these hold:

  - author is FIRST_TIME_CONTRIBUTOR / FIRST_TIMER,
  - a prior viewer triage marker exists,
  - no commits have been pushed since the marker, and
  - the PR's head commit is >= 30 days old.

The PR is then left to the stale-sweep flow (sweep 1b — "untriaged
draft >= 14 days" — or its successor), which is the appropriate
closure pathway.

The 30-day threshold is deliberately longer than the grace windows
(24h / 96h) and the F5a cooldown (72h). It captures *abandonment*,
not slow response — a contributor who replies within a week and
then stalls for another week is not abandoned, just busy.

Real-world impact seen during triage on `apache/airflow`: a
first-time-contributor PR triaged on 2026-04-02 with no push since
classified as `approve-workflow` 8 weeks later. The maintainer
skipped it manually; row 0 retires it automatically.
@potiuk potiuk merged commit 52f65e1 into apache:main May 27, 2026
15 checks passed
potiuk added a commit to apache/airflow that referenced this pull request May 28, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
choo121600 pushed a commit to apache/airflow that referenced this pull request May 29, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
(cherry picked from commit c521078)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
vatsrahul1001 pushed a commit to apache/airflow that referenced this pull request May 29, 2026
* Update apache-steward snapshot to 5c211a4

Bumps the local apache-steward snapshot from 339d3eb to 5c211a4 (22
upstream commits). The only committed change in this PR is a
1-line frontmatter addition (capability: capability:setup) to
.github/skills/setup-steward/SKILL.md, propagated from the new
framework version via /setup-steward upgrade. Everything else
lives in the gitignored .apache-steward/ snapshot.

Highlights from upstream (apache/airflow-steward):

- pr-management-triage: session-history gist persistence Step 6b
  (apache/magpie#343), four classifier heuristic fixes
  (apache/magpie#344), fetch-all-upfront pattern
  (apache/magpie#346)
- security-issue-triage: fetch-all-upfront analogue
  (apache/magpie#347)
- Framework labels + capability taxonomy (apache/magpie#340) —
  the source of the frontmatter line in this PR
- New skill pairing-self-review and tool spec-status-index
- claude-code pin 2.1.141 -> 2.1.150

/setup-steward upgrade ran cleanly locally: snapshot refreshed,
symlinks resolve, post-checkout hook in sync,
sandbox-add-project-root reconciled across 3 worktrees.
.apache-steward.local.lock updated to fetched_commit 5c211a4.
All .apache-steward-overrides/ files unchanged.

* Gitignore .apache-steward.session-state.json

Adds the per-machine session-state file to .gitignore. The file is
written by steward skills that maintain adopter-local persistence
anchors — currently pr-management-triage Step 6b's session-history
gist URL (apache/magpie#343), but the structure is
deliberately shared so other skills can add their own keys later.

The file is per-user, per-machine state; it should never be
committed even when a contributor stages everything with `git add -A`.
(cherry picked from commit c521078)

Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant